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We demonstrate a new framework for analyzing and controlling distributed sys- 
tems, by solving constrained optimization problems with an algorithm based on that 
framework. The framework is an information-theoretic extension of conventional full- 
rationality game theory to allow bounded rational agents. The associated optimization 
algorithm is a game in which agents control the variables of the optimization problem. 
They do this by jointly minimizing a Lagrangian of (the probability distribution of) 
their joint state. The updating of the Lagrange parameters in that Lagrangian is a 
form of automated annealing, one that focuses the multi-agent system on the optimal 
pure strategy. We present computer experiments for the fc-sat constraint satisfaction 
problem and for unconstrained minimization of NK functions. 


1.1 Introduction 

Recently a new framework for analyzing and controlling distributed systems 
has been developed [6, 7, 8]. This framework starts with a parameterized class 
of probability distributions, Q, across the joint state of the variables of the 
system. A Lagrangian function of q £ Q, is minimized to determine a q over 
the variables of the distributed system. We consider the special case of this 
probability Lagrangian framework in which Q is the set of product distributions. 

A strength of the framework is the connections it makes to relate disciplines 
to one another. For example, it can be motivated by using information theory to 
relate bounded rational game theory to statistical physics [6, 7]. In a noncooper- 
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ative game the agents are statistically independent at any stage of the game, with 
each agent i choosing its move Xi by sampling its probability distribution (mixed 
strategy) at that instant, qi(xi)\ the distribution of the joint-moves is a product 
distribution q(x € X) = Qi ( x *) • Inter-agent coupling occurs indirectly, across 

time, via the updating of the {qj at the end of each stage. Information theory 
shows that the (bounded rational) equilibrium of the game is the q optimizing 
an associated Lagrangian C{q). 

For some games the optimal q € Q is the minimizer of the Kullback-Leibler 
(KL) distance to a distribution p, D{q\\p) = ?( x ) ln(g(a:)/p(a:)) [1], where p 

is one of the variants of the canonical ensemble of statistical physics. In other 
words, the Lagrangian in such cases is D(q\\p) for an associated p from statistical 
physics. In particular, for Q being the set of product distributions, the bounded 
rational equilibrium of the game is a mean-field approximation to p. 

When the agents share the same utility function —G(x), the optimizer of 
C{q) is the distribution that minimizes the expected value of G, subject to any 
provided constraints and to an overall entropy value that sets the rationalities of 
the agents. Moreover, the updating of the qi at the end of each stage of the game 
can be designed to be a search process for an optimal q. For example, since q is a 
vector in a Euclidean space, the search can be done with continuous techniques 
like gradient descent or Newton’s method — even if X is a categorical, finite 
space. Under such updating, the evolution of the game serves as a distributed 
constrained optimization algorithm. Note how this contrasts with most stochas- 
tic optimization algorithms (e.g. simulated annealing). Those algorithms use 
probability distributions to help guide search for points x = [xi, - • • , xjv] € X 
optimizing a function G{x). In contrast, we search over distributions directly. 

In many optimization problems, particularly Constraint Satisfaction Prob- 
lems (CSPs), we want to find multiple solutions q. Multiple runs of the game 
outlined above might not find different q. Here we show how to construct a single 
game to obtain multiple distinct solutions at once. The approach is to reparam- 
eterize X so that a product distribution over the new parameters corresponds 
to a coupled distribution across X . We consider such a reparameterization that 
results in a mixture of M product distributions q(x) = m Qo(rn)q m (x) M* As 
described below, the associated Lagrangian “pushes” the separate q m (x) apart. 

We begin in §1.2 by elaborating our Lagrangian for mixture models, and 
consider simple methods to minimize this Lagrangian in §1.3. Experimental 
validation is presented for ^-satisfiability (§1.4.1) and NK (§1.4.2) problems. 


1.2 Specifying the Lagrangian 

To specify the Lagrangian we must first fix the distribution p(x) we wish to 
get as close (in KL distance) to. If the objective function we wish to mini- 
mize is G(x) (i.e., G is the negative of the utility shared by the bounded ra- 
tional agents) then we consider the T- parameterized Boltzmann distribution 
p(x) — exp(— G(x)/T) / Z(T ) (At low T — high rationalities — this distribu- 
tion is concentrated on x having low G values.) The KL distance to this p is 
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proportional to 

C(q) = E q [G}-TS(q) (1.1) 

where E q [G) = q(x)G(x ), and S(q) = — q(x)\nq(x) is the entropy of q. 
For product q's S(q) = Sfe) where 5(ft) = - <ft(xi) ln&fo). 

Since we are interested in problems with constraints, it is natural to write 
G(x) = O(x) + A a c a (x) where O is an objective to be minimized, and the 
c 0 are a set of constraint functions that are required to be non-negative. The 
A a are Lagrange multipliers that are used to enforce the constraints. (In CSP’s 
0{x) = 0.) 

As mentioned above, we consider distributions of the form q(x) = 

I2m=i ( lo(m)q m (x) where = 1 and q m (x) = fl, QT^i)- Substitute 

ing this into (1.1) gives the mixture Lagrangian 

L{q) = X>(m)£ ? ~[G] - TS(q) = ^ 9o (m)£(? m ) - TJ(q) (1.2) 

m m 

with £(<? m ) given by (1.1) and J(q) > 0 being the Jensen-Shannon (JS) distance, 

J(q) = S(£g 0 (mk m ) -E«oW^(« m ) = -EE®H9 m (x)ln-§r. 

The JS term pushes the optimal <? m to differ from each other. Unfortunately, it 
also couples all variables (because of the sum inside the logarithm), preventing 
a distributed solution. Thus, we replace J with another function which lower- 
bounds J and which requires less communication between agents. 


A Variational Lagrangian 

Following [2], we introduce M variational functions w(x\m) and lower-bound the 
true JS distance with 


J(q) = 


;qo(m ) 


w(x\ m)q{x) 


w(x\m) Qo( rn )Q rn ( x ) J 
= lnw(rr|m)) ~^<?o(™) ln£ 0 (ro) 




w(x\m)q(x) 
Qo {m)q rn (x)‘ 


Now replace M of the — In terms with the lower bound — In x > — vx 4* In v + 1 
obtained from the Legendre dual of the logarithm to find 

J(q) > J(q,w,i/) = y^^go(m)? m (2:)liiw(x|m) -£g 0 (m) \nq 0 (m) 

m x m 

x\m)q(x) + ^ q 0 (m) In v m + 1. 

x m 


m 


4 


Distributed Optimization 


We optimize over w and v to maximize this lower bound. To further aid in 
distributing the algorithm we restrict the class of variational u)(x\ m) to products: 
w(x\m) = UiMxi\™y For this choice 


J ( q , w, v) s X q 0 (m ) | B " 

m \ 


A rn '™i/fh + In i/ m > 4- S[qo] + 1 (1.3) 


where A?' m = flf = 

E, 4 flr(*i)l»Wi(**W. anci = T?=i 5 t m,m . 3 At any temperature T the 

variational Lagrangian which must be minimized with respect to w and v 
(subject to appropriate positivity and normalization constraints) is then 


L(q, w, v) — ^2q 0 (m)C(q m ) - TJ(q,w,u). 


( 1 . 4 ) 


1.3 Minimizing the Lagrangian 


Even if x € X is a discrete quantity (as in the cases we consider here) the 
optimization variables q determined by minimizing L for a fixed A are contin- 
uous so that gradient methods may be applied. Optimizing for the variational 
parameters w and v we find 


Vm 


1 

q 0 (m) 


Y^q 0 {m)A m 


Wi(xi\m) oc 


90 (ra)gr(2n) 
V m 


Arh,m 

X?° 

> m * 


-1 


( 1 . 5 ) 

( 1 . 6 ) 


The dependence of L on qo(m) is particularly simple: L(q,w,v) = 

£m?o(m)£(m) - T(S(q 0 ) + l), where 


£(m) = (AT) - T ( 5[g m ] + B m ' m - X + In v„ 


Thus the mixture weights are Boltzmann distributed: 


_ exp {-£{m)/T) 

q ° m Em ex p(~ £ (™')/ T ) 


( 1 . 7 ) 


The determination of qF( x i) is similar. The relevant terms in L involving qF( x i) 
are L « q 0 (m) J2 Xi £ m( x i)qF( x i) - TS(qF) where 


/ ^ ^ro,m 

£ m (xi) = E q ™{H\xi) - T\\nwi(xi\m) - X I”" 1 ) 

' rh t 

3 Note that if Wi(xi\m) = l/|Xd is uniform across Xi then = l/|Xi| and B = 

— In|Xi|. Maximizing over z/ m we find that J(q, w = l/pf),*' — ^*) =0. Thus, maximizing 
with respect to w increases the JS distance from 0. 
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The conditional expectation £ 9 m[G|a;i] is 0(x i ,x\ i )q^(x\ i ) where x\ { = 

,Xi- i,x i+ i,- -Xd] and q™(x\i) = 1j( x j)- The mixture proba- 

bilities are thus determined as 


_ exp(-g m (xj)/T) 
£* ( exp (Sm(xi)/T)' 


( 1 . 8 ) 


Note that these results requires minimal communication between agents. As- 
sign a 0 agent manage the determination of go(m) and (z,m) agents to manage 
determination of £™(xi). The M (i, m) agents for a fixed i communicate their 
Wi(xi\m) to determine A" 1 ’ 771 . These- results along with the B ™ ,,Tn from each 
(z,7n) agent are then forwarded to the 0 agent who forms A m,m and B 
broadcasts this back to all (z, m) agents. 


Updating Lagrange Multipliers: In order to satisfy the constraints we must 

also update the Lagrange multipliers. To minimize communication between 
agents this is done in the simplest possible way — by taking the partial derivatives 
with respect to A. This gives 

Xj <- Xj 4- a\E q *[cj(x)\ (1.9) 

where a\ is a step size and q* is the minimizer of L at the old settings of the 
multipliers. 


Estimation of Conditional Expectations: All update rules require esti- 

mation of conditional expectations with some variables clamped to particular 
values. These are estimated exactly if a closed form expression is available, or 
with Monte- Carlo sampling if no simple closed form exists. For the problems 
addressed here the expectations may be evaluated closed form, but Monte Carlo 
sampling can also be used [6, 8]. 


1.4 Experiments 

We test the method on two different problems: a &-sat constraint satisfaction 
problem having multiple feasible solutions, and an unconstrained optimization 
of an NK function. 


1.4.1 fc-sat 

The A;-sat problem is perhaps the best studied CSP [5]. The goal is to assign 
N binary variables values so that C clauses are satisfied. The ath clause 
involves k variables labeled by v a j € [1, N] (for j 6 [l,fc]), and k binary values 
associated with each a and labeled by The ath clause is satisfied iff c a (x) = 
Vj-il x v aJ — cr a ,j] is true. Accordingly we write C(x,A) = A T c(x) where A and 
c are vectors of length C whose a components are A a , and c a (x) respectively. 
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Figure 1.1: (a) Evolution of Lagrangian value (solid line), expected constraint vi- 
olation (dotted line), and constraint violations of most likely configuration (dashed 
line), (b) P(G ) after minimizing the Lagrangian for the first 3 multiplier settings. At 
termination P(G ) = 5(G). 


Noting that the ath clause is violated only when all x Va . — a a j (with a = 
not a), the Lagrangian over product distributions can be written as C(q) — 
X 1 c(q)—TS(q) where c(q) is the C - vector of expected constraint violations whose 
ath component is c a (q) = Yl^i 9v a j(^a ,j)- The only communication required to 
evaluate G and its conditional expectations is between agents appearing in the 
same clause. Typically, this communication network is sparse; for the N — 100, 
k = 3, C = 430 variable problem each agent interacts with only 6 other agents 
on average. 

For any fixed setting of the Lagrange multipliers, the Lagrangian is min- 
imized by iterating the equations (1.5) - (1-8). Rather than update a single 
agent at a time we randomly select a subset of variables no two of which impact 
each other and update the subset simultaneously. The minimization is termi- 
nated at a local minimum q * . If all constraints are satisfied at q * we return the 
solution x* = arg max x g*(x) otherwise the Lagrange multipliers are updated 
according to Eq. (1.9). In the present context, this updating rule offers a num- 
ber of benefits. Firstly, those constraints which are violated most strongly have 
their penalty increased the most, and consequently, the agents involved in those 
constraints are most likely to alter their state. Secondly, the Lagrange multipli- 
ers contain a history of the constraint violations (since we keep adding to A) so 
that when the agents coordinate on their next move they are unlikely to return 
a previously violated state. Lastly, rescaling the Lagrangian by the norm of A 
gives C(q) = A T c(q) — T5(<?)/||A|| where A = A/||A|| so that the updating the La- 
grange multipliers can be seen as defining a cooling schedule where T — > T/H A||. 
The parameter a\ thus governs the overall rate of cooling. We used a\ = 0.5. 

Fig. 1.1 presents results for a 100 variable k — 3 problem using a sin- 
gle mixture. The problem is satisfiable formula uf 100-01 . cnf from SATLIB 
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(ww.satlib.org). It was generated with the ratio of clauses to variables be- 
ing near the phase transition, and consequently has few solutions. Fig. 1.1(a) 
shows the variation of the Lagrangian, the expected number of constraint vi- 
olations, and the number of constraints violated in the most probable state 
x mp ~ arg max^. q(x) as a function of the number of iterations. The starting 
state is the maximum entropy configuration, and the starting temperature is 
T = 0.0015. The iterations at which the Lagrange multipliers are updated are 
indicated by vertical dashed lines which are clearly visible as discontinuities in 
the Lagrangian values. To show the stochastic underpinnings of the algorithm we 
plot in Fig. 1.1(b) the probability density of the number of constraint violations 
obtained as Prob(G r ) = Yl x q(z) 6(G - G(x , 1)). 

Results on a larger problem with more mixtures are shown in Fig. 1.2(a). 
This is the 250 variable/1065 clause problem uf 250-01 . cnf from SATLIB with 
the first 50 clauses removed so that the problem has multiple solutions. The 
initial temperature is 0.1. We plot the number of constraints violated in the 
most probable state of each mixture as a function of the number of updates, 
as well as the expected number of violated constraints. After 8000 steps three 
distinct solutions have been found along with a fourth solution which violates a 
single constraint. 




Figure 1.2: (a) The solid colored curves show the number of unsatisfied clauses in of 
the best x mp configurations of each of the 4 mixtures vs iterations. The solid black 
line plots the expected number of violations, and the dashed black line shows the 
approximation to the JS distance, (b) The solid colored curves show the evolution of 
the G value of the best x m p configurations for each of 5 mixtures versus number of 
iterations. The dashed black line shows the corresponding approximation to the JS 
distance. 
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1.4.2 Minimization of NK Functions 

The NK model defines a family of tunably difficult optimization problems [3]. 
The energy of N binary variables is defined as the average of N contribu- 
tions local to each variable x t and involving K other randomly chosen variables 
x\ * • -x ? : G(x) = N~ l Y,*Li • • • xf ). For each of the 2 K+1 local con- 

figurations Ei is assigned a value drawn uniformly from 0 to 1. Fig. 1.2(b) plots 
the energy of a 5 mixture model for a multi-modal N = 300 K — 2 function. 
At termination 5 distinct configurations are obtained with the nearest pair of 
solutions having Hamming distance 12. 

1.5 Conclusion 

A distributed constrained optimization framework based on probability La- 
grangians has been presented. Motivation for the framework was drawn from 
an extension of full-rationality game theory to bounded rational agents. An al- 
gorithm was developed and demonstrated on two problems. The results show 
a promising, highly distributed, off-the-shelf approach to constrained optimiza- 
tion. 
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